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Abstract 


HanDles  is  a  document  visualization  tool  developed  by  Ohio  State  University  for  DRDC  Toronto. 
One  aspect  of  documents  that  might  be  of  interest  to  analysts  is  the  extent  to  which  they  express 
positive  or  negative  opinion  or  sentiment  toward  some  issue  or  group.  In  this  report,  we  describe 
how  HanDles  was  extended  to  include  the  ability  to  classify  documents  as  containing 
predominantly  positive  or  negative  sentiment.  To  do  so,  we  trained  the  semantic  model 
underlying  HanDles'  understanding  of  the  document  collection  to  distinguish  positive  from 
negative  documents.  Our  tests  of  the  system  suggested  that  its  ability  to  discriminate  positive 
from  negative  documents  would  be  greatly  improved  by  selecting  a  training  collection  that  is 
similar  in  nature  and  content  to  the  documents  that  will  be  evaluated  in  operational  settings. 


Resume 


HanDles  est  un  outil  de  visualisation  de  documents  congu  par  TOhio  State  University  pour 
RDDC  Toronto.  Une  caracteristique  des  documents  qui  peut  s’averer  interessante  pour  les 
analystes  est  l’importance  de  Topinion  positive  ou  negative  que  degagent  ces  documents  a  Tegard 
de  certaines  questions  ou  de  certains  groupes.  Dans  le  present  rapport,  nous  decrivons  comment 
nous  avons  ameliore  HanDles  afm  qu’il  prenne  en  charge  la  classification  de  documents  selon  la 
predominance,  dans  leur  contenu,  de  sentiments  positifs  ou  negatifs.  Pour  ce  faire,  nous  avons 
forme  la  comprehension  du  modele  semantique  sous-jacent  a  HanDles  quant  au  recueil  de 
documents  utilise  afm  qu’il  soit  en  mesure  de  distinguer  les  documents  positifs  de  ceux  qui  sont 
negatifs.  Nos  essais  du  systeme  nous  poussent  a  croire  qu’il  est  possible  d’accroitre 
considerablement  sa  capacite  a  differencier  les  documents  selon  le  sentiment  qui  s’en  degage  en 
choisissant  un  recueil  de  formation  dont  la  nature  et  le  contenu  ressemblent  a  ceux  des  documents 
qui  seront  evalues  dans  un  contexte  operationnel. 
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Executive  summary 


Adding  a  Capability  to  Extract  Sentiment  from  Text  using 
HanDles: 

Peter  Kwantes;  Benjamin  Stone;  Jihun  Hamm;  Peter  Kwantes;  DRDC  Toronto 
TM  2012-0632012-063;  Defence  R&D  Canada  -  Toronto;  May  2012. 

Introduction  or  background:  HanDles  is  a  document  visualization  tool  developed  for  DRDC 
Toronto  as  part  of  ARP  project  15ah.  In  this  project,  HanDles  was  augmented  with  the  capability 
to  classify  documents  as  expressing  either  positive  or  negative  sentiment.  The  capability  was 
added  to  the  tool  so  that  it  could  be  used  in  Influence  Operations  contexts  in  which  analysts  want 
to  measure  the  extent  to  which  issues  or  groups  of  interest  are  viewed  favourably  or 
unfavourably.  As  a  test  case,  we  trained  HanDles  to  distinguish  good  and  poor  film  reviews,  and 
then  tested  it  three  times  to  see  how  well  it  classified  documents.  The  first  test  was  conducted  on 
reviews  of  the  Amazon  Kindle.  The  second  test  was  run  on  text  segments  of  the  original  training 
set  of  movie  reviews,  and  finally,  it  was  tested  on  a  set  of  movie  reviews  that  it  had  not  seen 
before. 

Results:  In  general,  HanDles  did  a  poor  job  detecting  the  sentiment  associated  with  the  reviews 
of  the  Amazon  Kindle.  We  attribute  the  poor  performance  to  the  fact  that  movie  and  product 
reviews  discuss  different  issues,  and  as  such,  there  will  be  limited  similarity  in  the  two  classes  of 
document.  Not  surprisingly,  HanDles  did  a  good  job  classifying  text  segments  of  the  original 
training  set.  However,  the  finding  demonstrated  that,  unlike  many  other  sentiment  analysis  tools 
that  only  classify  text  at  the  whole-document  level,  HanDles  can  be  used  effectively  to  extract  the 
issues  being  discussed  within  documents,  and  assign  sentiment  to  those.  For  example,  a  review  of 
a  film  might  be  classified  as  negative  overall,  but  HanDles  can  determine  that,  for  example,  the 
acting  was  good,  but  the  directing  was  poor.  Finally,  when  we  tested  HanDles  on  a  new  set  of 
movie  reviews  it  had  not  seen  before,  it  performed  with  93.3%  accuracy. 

Significance:  The  results  of  our  trial  suggest  that  HanDles  could  represent  a  powerful  tool  for 
extracting  sentiment  and  other  kinds  of  higher-level  properties  from  reports  or  intercepted  media. 
What  will  be  vital  however,  is  that  the  system  first  be  trained  on  the  right  kinds  of  documents.  In 
other  words,  there  must  be  some  similarity  between  the  documents  used  during  training  and  those 
used  in  the  operational  context  in  order  for  it  to  work  properly. 

Future  plans:  To  transition  HanDles  into  operational  use,  DRDC  and  CF  stakeholders  must 
decide  on  what  the  most  appropriate  documents  are  for  training  the  system  to  tell  the  difference 
between  positive  and  negative  opinion  in  text.  Once  a  class  of  document  has  been  decided,  we 
can  proceed  to  train  it  and  trial  the  system  in  a  more  realistic  context. 
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TM  2012-0632012-063  ;  R  &  D  pour  la  defense  Canada  -  Toronto;  mai  2012. 

Introduction  ou  contexte  :  HanDles  est  un  outil  de  visualisation  de  documents  confu  pour 
RDDC  Toronto  dans  le  cadre  du  PRA  15ah.  Dans  ce  projet,  nous  avons  ajoute  dans  HanDles  la 
capacite  de  classifier  les  documents  selon  Timpression  positive  ou  negative  qui  s’en  degage. 
Nous  avons  apporte  cette  amelioration  afm  de  pouvoir  utiliser  cet  outil  dans  le  cadre  d’ operations 
d’influence  au  cours  desquelles  des  analystes  souhaitent  mesurer  Timportance  des  opinions 
favorable  et  defavorable  envers  des  problemes  ou  groupes  d’interet  donnes.  Au  cours  du  scenario 
d’essai,  nous  avons  forme  HanDles  afin  qu’il  fasse  la  distinction  entre  bonnes  et  mauvaises 
critiques  de  film.  Nous  avons  execute  ce  scenario  a  trois  reprises  dans  le  but  de  verifier  a  quel 
point  il  parvient  a  classifier  correctement  les  documents.  Le  premier  essai  portait  sur  des  critiques 
de  Tappareil  Kindle  d’Amazon.  Le  deuxieme  se  concentrait  quant  a  lui  sur  des  extraits  de  textes 
tires  du  recueil  de  critiques  de  film  de  formation.  Finalement,  nous  avons  effectue  le  troisieme 
essai  a  l’aide  d’un  recueil  de  critiques  de  film  qu’HanDles  n’avait  jamais  traite  auparavant. 

Resultats  :  Dans  Tensemble,  HanDles  ne  parvient  pas  vraiment  a  detecter  le  sentiment  associe 
aux  critiques  du  Kindle.  Nous  attribuons  ce  pietre  rendement  au  fait  que  les  critiques  de  films  et 
d’appareils  traitent  des  questions  differentes,  et  done  qu’il  n’y  a  que  peu  de  similarites  entre  ces 
deux  categories  de  documents.  Comme  nous  nous  y  attendions,  il  a  effectue  un  bon  travail  quant  a 
la  classification  des  segments  de  texte  provenant  du  recueil  de  formation.  Cependant,  nos  resultats 
demontrent  que,  contrairement  a  de  nombreux  autres  outils  d’analyse  d’impressions  qui 
n’effectuent  que  la  classification  de  textes  complets,  HanDles  est  en  mesure  d’extraire 
correctement  les  points  traites  dans  les  documents  et  de  leur  attribuer  un  sentiment.  Par  exemple, 
dans  le  cas  d’une  critique  de  film  classifiee  comme  etant  negative  dans  Tensemble,  il  peut 
determiner  que  le  jeu  des  acteurs  etait  bon,  mais  la  realisation,  mediocre.  Finalement,  lorsque 
nous  avons  mis  HanDles  a  l’essai  a  l’aide  d’un  recueil  de  critiques  de  films  qu’il  n’a  jamais  traite 
auparavant,  il  a  classifie  les  documents  avec  une  exactitude  de  93,3  %. 

Signification  :  Les  resultats  obtenus  lors  de  nos  essais  nous  portent  a  croire  que  l’outil  HanDles 
pourrait  constituer  un  puissant  outil  permettant  d’extraire  les  sentiments  emanant  de  rapports  ou 
de  medias  interceptes.  Il  est  cependant  primordial  de  former  le  systeme  a  l’aide  des  bons  types  de 
documents.  En  d’ autres  mots,  pour  que  cet  outil  fonctionne  adequatement,  il  doit  y  avoir  certaines 
similarites  entre  les  textes  employes  au  cours  de  la  formation  d’HanDles  et  ceux  traites  dans  un 
contexte  operationnel. 

Plans  pour  l’avenir  :  Pour  effectuer  la  transition  d’HanDles  vers  un  environnement  operationnel, 
les  intervenants  de  RDDC  et  des  FC  doivent  decider  quels  sont  les  documents  les  plus  appropries 
a  la  formation  du  systeme  afm  qu’il  soit  en  mesure  de  bien  discerner  les  opinions  positives  de 
celles  negatives.  Lorsqu’ils  auront  choisi  une  categorie  de  texte,  nous  pourrons  proceder  a  la 
formation  et  a  la  mise  a  l’essai  du  systeme  au  sein  d’un  environnement  plus  realiste  que  ceux 
utilises  jusqu’ a  present. 
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1  Introduction 


1.1  Purpose  of  the  work 

Traditional  information  retrieval  mechanisms  focus  on  the  content  of  a  document— that  is,  the 
issues  that  the  document  raises.  Just  as  critical  in  many  intelligence  and  Influence  Operations 
contexts  is  the  stance  that  the  author  takes  with  respect  to  those  issues,  or  how  that  author  feels 
about  them.  The  area  of  automated  opinion  mining  and  sentiment  analysis  (OMSA)  uses  natural 
language  processing  and  machine  learning  techniques  to  classify  documents  into  positive  and 
negative  classes.  The  objective  is  to  facilitate  an  analyst's  task  in  identifying  critical  documents  to 
study  from  large  collections  and  to  provide  a  global  view  of  the  sentiment  expressed  across  the 
documents. 

The  purpose  of  this  work  was  to  introduce  a  sentiment  analysis  mechanism  into  the  HanDles 
document  visualization  tool. 

1.1.1  Integrating  OMSA  into  the  HanDles  Tool 

HanDles  provides  a  search  interface  to  a  document  set  and  three  visualizations  to  allow 
participants  to  quickly  assimilate  large  document  collections.  The  first  of  these  visualizations  is  a 
typical  results  set  as  would  be  returned  from  an  Internet  search  engine  such  as  Google  or  Bing.  In 
addition,  however,  HanDles  provides  a  set  of  automatically  generated  tags  known  as  handles,  that 
allow  the  user  to  quickly  select  subsets  of  documents  of  interest.  The  second  visualization  plots  a 
projection  of  a  semantic  space  derived  from  the  documents.  Both  handles  and  documents  are 
plotted  with  proximity  coding  for  semantic  similarity.  Rather  than  provide  a  static  display, 
however,  handles  and  documents  can  be  dragged  to  interactively  modify  the  view  of  the  space 
afforded  to  the  user.  Finally,  a  timeline  view  plots  the  popularity  of  different  handles  as  a  function 
of  time. 

To  integrate  OMSA  into  the  tool  two  special  handles  titled  "Positive  Sentiment"  and  "Negative 
Sentiment"  were  added.  The  user  can  choose  to  have  these  handles  displayed  by  clicking  on  a 
sentiment  link  in  the  Search  Options  box  on  the  main  results  page  of  the  tool.  The  positive  and 
negative  handles  then  become  available  in  all  three  views  of  the  interface.  In  the  document  view, 
they  allow  the  user  to  quickly  highlight  either  the  positive  or  negative  documents,  so  that  they  can 
scan  those  quickly.  This  view  also  provides  counts  of  the  positive  and  negative  documents, 
allowing  the  user  to  rapidly  assess  the  general  sentiment  towards  the  topic  for  which  they 
searched  in  the  document  set. 

In  the  space  view,  handles  expressing  content  issues  can  be  arranged  around  the  screen.  For 
instance,  if  you  were  a  market  researcher  examining  reviews  of  the  Amazon  Kindle  you  might 
arrange  issues  like  page  turning,  battery  life,  customer  support,  and  other  aspects  around  the 
screen.  Selecting  one  of  the  sentiment-labeled  handles  then  highlights  the  documents  that  are 
classified  with  that  sentiment.  The  distribution  of  those  highlighted  documents  makes  it  visually 
apparent  how  sentiment  is  distributed  across  those  issues.  In  the  Kindle  example,  this  would 
allow  you  to  quickly  determine  not  only  that  customer  support  is  an  issue  of  interest  to  the 
reviewers,  but  also  whether  they  were  positively  or  negatively  disposed  towards  Amazon's 
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support.  The  way  in  which  handles  cluster  also  highlights  the  hot  spots  that  might  require  more 
detailed  study.  At  any  time,  a  user  can  click  directly  on  the  dot  representing  a  document  to  deepen 
their  investigation. 

In  the  timeline  view,  the  positive  and  negative  handles  can  be  used  to  trace  how  sentiment  has 
changed  as  a  function  of  time.  This  function  is  critical  in  operations  in  which  one  is  attempting  to 
change  the  sentiment  expressed  by  authors  to  monitor  the  success  of  those  programs. 

In  many  cases,  operating  at  the  level  of  entire  documents  (news  articles,  movie  reviews  etc.)  is 
too  coarse.  Within  a  document  many  issues  may  be  raised  and  the  sentiment  expressed  towards 
each  issue  may  not  be  the  same.  A  movie  reviewer  might  have  liked  the  plot,  but  think  that  the 
acting  left  something  to  be  desired,  for  instance.  To  capture  this,  HanDles  allows  one  to  divide 
documents  into  subdocuments  of  fixed  length.  The  main  results  screen  provides  a  box  in  which 
one  can  specify  how  many  words  will  be  contained  in  these  document  fragments. 

1.1.2  The  Classifier 

The  ability  to  effectively  use  the  sentiment  mechanisms  deployed  in  HanDles  tool  relies  on  the 
accuracy  of  the  sentiment  classifier.  There  has  been  a  recent  surge  in  work  on  sentiment 
classification,  based  in  part  on  the  provision  of  large  sets  of  documents  labeled  with  sentiment.  In 
particular,  Pang  and  Lee  (2004)  provided  a  movie  review  dataset  consisting  of  1000  positive  and 
1 000  negative  reviews  crawled  from  the  Internet  Movie  Database  (IMDB)  movie  archive,  with  an 
average  length  of  30  sentences. 

The  algorithm  we  used  to  classify  documents  into  positive  and  negative  classes  is  called  a 
Support  Vector  Machine  (SVM).  Simply  put,  an  SVM  examines  a  large  collection  of  documents 
that  have  already  been  classified  as  positive  or  negative,  and  tries  to  work  out  the  function  that 
best  differentiates  them.  After  training,  the  SVM  uses  the  function  it  has  decided  upon  to  classify 
new  documents  as  being  either  positive  or  negative  in  tone  or  sentiment. 

To  try  to  achieve  maximum  accuracy,  generalization  and  speed,  we  used  only  words  that 
appeared  in  a  polarity  dictionary  (from  He,  Lin  and  Alani,  2011)  that  appeared  at  lest  50  times  in 
the  coipus  and  employed  a  pure  discriminative  classifier.  We  initially,  tried  libSVM  from  the 
Shogun  machine  learning  package  (Sonnenburg  et.  al.  2010).  With  it,  we  obtained  accuracy  of 
89.7%  on  positive  examples  and  65.2%  on  negative  examples,  for  an  average  of  77.1%.  Given 
that  performance  on  the  negative  examples  was  low,  we  switched  to  the  Generalized  Minimum 
Norm  Problem  SVM  (Franc,  2005).  This  classifier  produced  accuracies  of  87.6%  on  positive 
examples,  83.8%  on  negative  examples,  for  an  average  performance  of  85.7%,  which  was 
acceptable. 

It  is  worth  noting  that  some  authors  have  reported  accuracy  in  excess  of  95%  on  this  dataset  using 
a  combination  generative  graphical  model  plus  support  vector  machine  (SVM)  approach  (He,  Lin 
&  Alani,  2011).  Part  of  the  work  involved  in  the  contract  under  which  this  work  was  carried  out 
was  an  attempt  to  recreate  the  high  level  of  accuracy  reported  by  He  et  al.  Several  attempts  using 
different  parameterizations  failed  to  reproduce  the  very  high  accuracy  reported  by  He  et  al,  and  as 
a  result,  those  enhancements  to  the  classifier  were  not  implemented  in  the  final  product  discussed 
here. 
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1.1.3  Evaluation 


As  an  initial  test,  we  tried  the  sentiment  analysis  mechanism  on  a  set  of  reviews  of  the  Amazon 
Kindle  that  were  extracted  from  the  Amazon  website.  It  quickly  became  apparent  that  the 
classifier  did  not  generalize  well  to  these  documents,  with  many  misclassifications  immediately 
obvious.  We  do  not  have  ground  truth  labels  with  which  to  quantify  the  performance  on  this 
dataset,  but  it  is  clear  that  generalization  across  domains  will  be  difficult.  Note  that  we  used  a 
restricted  vocabulary  of  polarity  laden  terms,  which  should  help  to  improve  generalization,  and 
yet  performance  was  disappointing.  It  may  be  necessary  in  the  future  to  train  classifiers  for 
specific  domains  and/or  to  work  to  improve  the  vocabulary  of  polarity  words. 
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all  the  way  up  to  stargate  and  independance  day  ,  and  one  could  argue 
revitalizing  the  whole  science  fiction  genre  .  the  special  effects  were 
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trilogy  and  been  touched  by  it .  and  now  ,  george  lucas  brings  it  back  for  a  new 
audience  to  enjoy  .  much  has  been  and  is  still  being  made  of  the  fact  that  mr  . 

jake  lloyd 

13 

lucas  has  gone  back  and  revised  his  trilogy  for  rerelease  now  .  some  fans 

Figure  1  Documents  View,  67  “Star  Wars  ”  movie  reviews  have  been  split  into  document  size  50, 


resulting  in  1214  handles  documents. 
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Figure  2  The  Space  View  has  been  organized  by  Negative  and  Positive  Sentiment. 

Next  we  considered  a  query  on  the  sentiment  database  itself.  We  chose  to  examine  sentiment 
towards  the  movie  Star  Wars.  The  initial  query  returned  67  documents.  HanDles  focused  mainly 
on  the  actors  and  miscellaneous  issues.  Consequently,  we  chose  to  divide  these  documents  into 
shorter  segments  of  50  words.  The  resulting  set  of  1214  documents  and  30  handles  were  much 
easier  to  interpret  (see  Figure  1).  It  is  clear  that  dividing  the  documents  is  a  critical  capability.  In 
the  future,  it  would  be  worthwhile  to  investigate  more  sophisticated  methods  for  division 
including  segmenting  at  sentence  boundaries. 

Figure  2  shows  the  space  view  organized  by  sentiment.  This  was  accomplished  by  “freezing”  or 
locking  the  Positive  Sentiment  and  Negative  Sentiment  handles  in  opposite  regions  in  the  display. 
These  sentiment  handles  were  then  clicked  repeatedly,  or  “pumped”,  to  pull  related  documents 
closer  to  them.  As  can  be  seen  in  Figures  3  through  5,  sentiment  can  now  be  explored  by  clicking 
on  the  sentiment  handles  and  other  handles  of  interest  in  the  display.  We  find  that  both  “George 
Lucas”  and  “Obi  Wan  Kenobi”  documents  contain  a  lot  of  Negative  Sentiment  handles. 
Alternatively,  “Special  Effects”  documents  generally  contained  Positive  Sentiment  handles. 
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thrown  in  for  bad  measure  .  maybe  it  will  require  george  lucas  and  his  new  star 
wars  movie  to  take  computer-generated  visuals  to  the  next  level  .  godzilla  never 
really  pushes  the  envelope  ,  preferring  to  remain  within  a  comfort  zone  .  the 
imagination  of  monster  movies  like  king  kong 
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Figure  3  Negative  Sentiment  and  “George  Lucas 

One  of  the  difficulties  that  presents  itself  in  when  attempting  to  extract  sentiment  out  movie 
review  documents,  is  the  context  of  the  sentiment.  A  good  example  of  this  problem  is  displayed 
below  in  Figure  6.  The  text  fragment  that  has  been  open  reads: 

“senate  to  make  an  appeal  for  justice  .  on  tatooine  ,  qui  gon  discovers  a  young 
slave  boy  ,  anakin  skywalker  ( jake  lloyd  )  ,  who  not  only  can  help  them  get  the 
parts  they  need  ,  but  displays  uncanny  intelligence  ,  insight  ,  and  instincts  .  qui 
gon  senses  the” 

In  this  example  the  classifier  has  labeled  the  document  as  containing  Positive  Sentiment,  and  to 
some  extent  this  is  true  because  the  descriptive  terms  “uncanny  intelligence”,  “insight”  and 
“instincts”  are  both  desirable  and  positive  qualities  to  attribute  to  someone.  The  problem  with 
classification  lies  in  the  context  or  role  of  the  author.  In  this  case,  the  author  adopts  the  role  or 
voice  of  “story  teller”,  rather  than  the  voice  of  a  movie  reviewer  whose  sentiment  and  opinions 
we  were  initially  interested  in  polling.  While  the  problem  of  distinguishing  the  author's  voice  is 
more  likely  to  occur  in  reviews  of  movie  or  books,  it  is  still  possible  it  could  occur  in  other 
datasets. 
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plan  to  do  anything  .  the  young  obi  wan  kenobi  (  ewan  mcgregor )  who  has  a 
bad  hair  day  in  every  scene  ,  gets  to  kill  a  bad  guy  in  the  end  ,  but  lacks  a  love 
interest  to  make  him  a  desirable  sort  of  hero  .  in  fact  all 


obi  wan  kenobi  I  NegativeSentiment 


obi  wan 


Ji 


NegativeSentiment 


•  • 


Figure  4  Negative  Sentiment  and  “Obi  Wan  Kenobi  ” 

Another  potential  problem  that  is  unique  to  the  handles  interface  is  displayed  in  Figure  7.  Ideally, 
a  request  to  HanDles  for  “Star  Wars”  movie  reviews  would  only  return  those  reviews  that  are 
about  the  six  Star  Wars  movies;  however,  this  is  not  the  case.  There  are  many  different  movies 
reviewed  in  Pang  and  Lee's  (2004)  dataset,  some  of  which  are  not  reviewing  a  Star  Wars  movie, 
but  do  reference  the  Star  Wars  movies.  For  example,  one  of  the  reviews  of  the  film  Toy  Story 
mentions  a  parody  of  Star  Wars  made  by  that  movie.  Another  review  compares  the  film,  Starship 
Troopers  to  Star  Wars  saying  that  “Starship  Troopers  is  very  reminiscient  [sic]  of  star  wars  , 
another  kick-ass  space  opera”.  While  returning  these  documents  is  not  necessarily  a  bad  thing  (the 
user  may  wish  to  know  all  references  about  a  subject),  it  is  worth  the  user  bearing  in  mind  that 
these  types  of  documents  may  be  returned  by  HanDles. 

The  Pang  and  Lee's  (2004)  dataset  does  not  have  dates  associated  with  the  individual  movie 
reviews,  so  it  as  not  possible  to  view  sentiment  in  the  Timegraph  view.  This  was  unfortunate,  as 
viewing  handles  (of  which  Sentiment  is  one)  over  time  is  a  very  informative  feature  of  the 
HanDles  application. 
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Figure  5  Positive  Sentiment  and  “Special  Effects”. 


Figure  6  The  potential  problem  of  context  for  Sentiment  Classification 
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Figure  7  The  problem  of  “non  Star  Wars  ”  reviews  that  mention  “Star  Wars 

It  should  not  come  as  a  surprise  that,  in  the  above  test  of  the  capability,  we  obtained  acceptable 
results  when  we  used  movie  reviews  taken  from  the  training  set.  In  a  second  test  of  the  system, 
we  selected  30  movie  reviews  written  by  IMDB  users.  Fifteen  of  the  reviews  were  taken  from  a 
list  of  the  worst  movies  ever  made  ( LOL ,  Jack  And  Jill,  Meet  The  Spartans,  Titanic,  Dragonball, 
Epic  Movie,  Vampires  Suck,  Spy  Kids,  Disaster  Movie,  Catwoman,  Buck}’  Larson,  The  Room, 
Battlefield  Earth,  Superbabies,  The  Hottie  And  The  Nottie)  and  1 5  from  a  list  of  the  best  movies 
ever  made  ( Shawshank  Redemption,  The  Dark  Knight,  Lord  of  the  Rings,  Inception,  Se7en, 
Spirited  Away,  A  Separation,  Toy  Stoiy  3,  The  Lives  Of  Others,  The  Untouchables,  Wall-E, 
Braveheart,  Pan's  Labiynth,  Batman  Begins,  Gran  Torino).  Figure  8  shows  the  film  titles  in  the 
HanDles  map.  As  a  next  step,  we  moves  the  PositiveSentiment  and  NegativeSentiment  handles 
to  opposite  comers  of  the  screen.  Doing  so  causes  the  movie  title  on  the  screen  handles  to  follow 
the  most  strongly  associated  sentiment  handle.  Then,  to  further  separate  the  them,  we 
continuously  click  or,  “pump”  the  positive  and  negative  sentiment  handles  to  draw  associated 
titles  closer.  As  is  clear  in  Figure  9,  HanDles  does  an  impressive  job  of  differentiating  good  and 
poor  reviews.  HanDles  did  make  two  interesting  misclassifications,  however.  It  mistakenly 
classified  the  review  for  the  Clint  Eastwood  film,  Gran  Torino  as  negative  and  the  Adam  Sandler 
flop.  Jack  and  Jill ,  as  positive  (see  Figure  10).  An  examination  of  the  narrative  in  the  reviews 
clarifies  how  the  misclassification  occurred.  Gran  Torino  is  a  film  about  a  grumpy,  foul  mouthed 
man  and  as  such,  the  film’s  description  contains  many  terms  that  are  generally  associated  with 
negative  sentiment  like,  for  example,  foul,  racist,  dirty  (as  in  Dirty  Harry)  racist,  complaints,  and 
garbage.  In  essence,  the  negative  terms  contained  in  the  text  misled  the  classifier. 

“With  his  performance  Eastwood  shows  you  why  people  like  himself,  Jack 
Nicholson,  or  Paul  Newman  only  come  around  once  in  a  lifetime.  Though 
Eastwood  would  rather  focus  on  directing,  he  can  still  cany  a  movie  with  his 
on  screen  presence,  and  he's  pure  dynamite  in  "Gran  Torino".  Perhaps  the 
poor  box  office  results  of  "hollywoody"  movies  like  Absolute  Power,  True 
Crime,  Space  Cowboys,  and  Blood  Work,  caused  Eastwood  to  shy  away  from 
acting,  but  given  cutting  edge  material  to  work  with  as  "Million  Dollar  Baby" 
and  "Gran  Torino",  he's  as  good  as  ever.  His  character  as  the  racist  and  salty 
war  vet  makes  you  think  of  that  old  guy  we've  all  had  on  our  blocks  with  the 
garbage  (sic)  door  open,  the  million  tools  hanging  everywhere,  and  always 
fixing  or  building  something.  I  found  myself  not  wanting  the  movie  to  end 
because  the  scenes  between  himself  and  the  various  Hmong  characters  where 
priceless.  There  may  be  complaints  over  the  racist  remarks  and  scenes,  but 
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Eastwood  pulls  it  off  in  a  way  a  real  person  like  that  would  talk  or  act  to  a 
point  where  it  ends  up  being  lighthearted.  I'm  not  going  to  give  the  plot  away, 
but  if  you  like  your  Clint  Eastwood  as  a  hard-nosed  tough  guy  with  foul 
language  alia  Dirty  Harry >  or  Heartbreak  Ridge,  you'll  love  this  movie!!” 

On  the  other  hand,  the  film,  Jack  and  Jill  was  classified  as  containing  positive  sentiment,  despite 
the  review  being  clearly  scathing. 

“7  will  start  by  saying  that  I  have  enjoyed  many  Adam  Sandler  movies  and 
have  found  him  to  be  a  generally  funny  guy  when  I've  seen  him  interviewed  and 
when  he  was  on  Saturday  Night  Live.  I  laughed  gleefully  through  Anger 
Management,  Mr.  Deeds,  Billy  Madison,  50  First  Dates,  and  Happy  Gilmore. 

Funny  People  and  Just  Go  With  It  were  awesome  movies!  I  was  brought  to 
tears  in  Sandler's  emotional  portrayal  in  Reign  Over  Me.  I  have  great  respect 
for  the  man  as  a  comedian  and  actor.  But  Jack  and  Jill  is  abysmal.  The  "jokes  " 
are  not  only  bizarrely  misplaced  -  THEY  ARE  NOT  FUNNY.  I  did  not  hear  a 
single  laugh,  not  even  a  slight  giggle  from  any  audience  member  in  the  theater. 

In  fact,  almost  1/3  of  them  walked  out  before  it  was  over.  Those  who  stayed, 
openly  derided  the  flick  as  we  all  exited  the  theater  in  utter  disgust  and 
sadness.  I  don't  know  why  Al  Pacino  was  in  this  movie,  his  acting  made  it  seem 
like  he  was  forced  at  gun  point  to  do  this  movie.  Nick  Swardson  and  Tim 
Meadows  are  way  too  funny  to  be  in  such  a  disaster.  Especially  given 
Swardson' s  stellar  performance  in  Just  Go  With  It.  This  movie  is  not  a  flop,  its 
not  an  "oops",  its  not  a  mistake  -  it's  a  career  ending  pile  of  trash.  A  career 
ender  that  started  with  Sandler's  god  awful  "Grown  Ups"  and  climaxes  with 
this  revolting  hunk  of  garbage.  Sorry  Adam,  it's  over.  ” 

Here  again,  the  film’s  review  is  clearly  negative,  but  the  document  contains  several  positive  terms 
like,  funny,  laughed,  gleefully,  Happy,  respect,  funny  (interestingly,  “funny”  appears  in  the 
phrase  “not  funny”),  laugh,  giggle,  and  stellar.  A  document’s  classification  is  made  by  weighing 
its  positive  and  negative  aspects,  and  in  this  case,  there  was  clearly  enough  positive  sentiment 
expressed  in  the  review  to  mislead  the  classifier. 
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Figure  8  HanDles  map  of  30  film  reviews  from  IMDB  users 
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I  will  start  by  saying  that  I  have  enjoyed  many  Adam  Sandler  movies  and  have 

found  him  to  be  a  generally  funny  guy  when  i've  seen  him  interviewed  and  |  3 

when  he  was  on  Saturday  Night  Live.  I  laughed  gleefully  through  Anger  JACK  AND  JILL 

Management,  Mr.  Deeds,  Billy  Madison,  50  First  Dates,  and  Happy  Gilmore. 

Funny  People  and  Just  Go  With  It  were  awesome  movies!  I  was  brought  to 
tears  in  Sandler's  emotional  portrayal  in  Reign  Over  Me.  I  have  great  respect 
for  the  man  as  a  comedian  and  actor.  But  Jack  and  Jill  is  abysmal.  The  "jokes" 
are  not  only  bizarrely  misplaced  -  THEY  ARE  NOT  FUNNY.  I  did  not  hear  a  single 
laugh,  not  even  a  slight  giggle  from  any  audience  member  in  the  theater.  In 
fact,  almost  1/3  of  them  walked  out  before  it  was  over.  Those  who  stayed, 
openly  derided  the  flick  as  we  all  exited  the  theater  in  utter  disgust  and 
sadness.  I  don't  know  why  Al  Pacino  was  in  this  movie,  his  acting  made  it 
bi  seem  like  he  was  forced  at  gun  point  to  do  this  movie.  Nick  Swardson  and  Tim 
Meadows  are  way  too  funny  to  be  in  such  a  disaster.  Especially  given 
Swardson's  stellar  performance  in  Just  Go  With  It.  This  movie  is  not  a  flop,  its 
not  an  "oops",  its  not  a  mistake  -  it's  a  career  ending  pile  of  trash.  A  career 
ender  that  started  with  Sandler's  god  awful  “Grown  Ups"  and  climaxes  with 
this  revolting  hunk  of  garbage.  Sorry  Adam,  it’s  over. 
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With  his  performance  Eastwood  shows  you  why  people  like  himself,  Jack 
Nicholson,  or  Paul  Newman  only  come  around  once  in  a  lifetime.  Though 
Eastwood  would  rather  focus  on  directing,  he  can  still  carry  a  movie  with  his 
on  screen  presence,  and  he's  pure  dynamite  in  "Gran  Torino".  Perhaps  the 
poor  box  office  results  of  "hollywoody”  movies  like  Absolute  Power,  True 
Crime,  Space  Cowboys,  and  Blood  Work,  caused  Eastwood  to  shy  away  from 
acting,  but  given  cutting  edge  material  to  work  with  as  "Million  Dollar  Baby" 
and  "Gran  Torino",  he's  as  good  as  ever.  His  character  as  the  racist  and  salty 
war  vet  makes  you  think  of  that  old  guy  we've  all  had  on  our  blocks  with  the 
garbage  door  open,  the  million  tools  hanging  everywhere,  and  always  fixing  or 
building  something.  I  found  myself  not  wanting  the  movie  to  end  because  the 
scenes  between  himself  and  the  various  Hmong  characters  where  priceless. 
There  may  be  complaints  over  the  racist  remarks  and  scenes,  but  Eastwood 
pulls  it  off  in  a  way  a  real  person  like  that  would  talk  or  act  to  a  point  where  it 
ends  up  being  lighthearted.  I'm  not  going  to  give  the  plot  away,  but  if  you  like 
your  Clint  Eastwood  as  a  hard-nosed  tough  guy  with  foul  language  alia  Dirty 
Harry  or  Heartbreak  Ridge,  youll  love  this  movie!  I 
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Figure  10  HanDles  misclassified  two  reviews:  Gran  Torino  and  Jack  and  Jill. 
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2  Conclusion  and  Recommendations 


The  most  important  finding  presented  in  this  report  is  that  it  will  be  necessary  in  the  future  to 
train  classifiers  for  specific  domains  and/or  to  work  to  improve  the  vocabulary  of  polarity  words 
used  by  HanDles  or  any  other  mechanism  that  could  be  used  to  automatically  classify  documents 
according  to  their  sentiment.  In  particular,  the  classifier  should  be  trained  on  documents  that  are 
similar  to  those  which  will  be  examined  in  operational  settings.  For  example,  in  an  Influence 
Operations  context,  CF  analysts  may  be  interested  in  measuring  positive  and  negative  sentiment 
expressed  by  locals  toward  local  governance  or  the  Canadian  Forces.  Movie  reviews  do  not 
provide  the  ideal  class  of  document  with  which  to  train  a  sentiment  analysis  system.  Instead,  a 
large  document  collection  comprising  text  written  for  and  my  members  of  the  local  population 
would  be  most  appropriate.  To  this  end,  it  is  important  to  decide  at  the  beginning  of  an  influence 
operation,  what  kinds  of  documents  from  the  area  of  operations  should  be  gathered  and  submitted 
to  the  system. 

Another  conclusion  from  the  work  reported  here  is  that  the  ability  to  divide  the  documents  is  a 
critical  capability  of  Sentiment  Analysis  in  HanDles,  so  future  development  work  in  this  area 
would  be  of  benefit  to  the  HanDles  system.  Traditional  sentiment  analysis  applies  positive  and 
negative  classifications  to  whole  documents.  As  we  demonstrated  above  however,  such  a  coarse 
level  of  analysis  will  often  fail  to  capture  what  aspects  of  the  topics  under  discussion  are  written 
about  positively  or  negatively.  For  example,  the  review  of  Jack  and  Jill  above  is  decidedly 
negative,  but  contains  positive  sentiment  when  discussing  its  cast.  HanDles’  ability  to  subdivide 
documents  allows  it  to  detect  the  sentiment  associated  with  various  topics  discussed  in  a 
document — a  capability  not  common  to  tools  of  this  sort,  but  nonetheless  critical  for  accurate 
situational  awareness  and  measurement  of  a  campaign’s  effectiveness  in  operational  settings. 

HanDles  is  ready  for  exploitation.  The  next  step  in  its  transition  to  operational  use  will  be  the 
generation  of  an  appropriate  collection  of  documents  to  use  for  training  the  system  to  distinguish 
documents  expressing  positive  and  negative  sentiment.  To  do  so  however,  DRDC  and  CF 
personnel  from  the  Intelligence  and  Influence  Activities  context  will  need  to  work  together  to 
decide  on  where  those  documents  will  come  from. 

Sentiment  is  one  of  several  kinds  of  classification  that  can  be  applied  to  documents.  In  future,  we 
can  envisage  other  forms  of  discrimination  being  introduced  to  the  tool.  For  example,  the  SVM  in 
HanDles  could  be  trained  to  automatically  distinguish  threatening  from  non-threatening  narrative 
in  blogs  or  violent  from  non-violent  events  described  in  intelligence  or  situation  reports.  The 
point  is  that  the  SVM  is  a  generic  classifier,  and  can  be  trained  to  classify  documents  on  any 
dimension  so  long  as  the  documents  in  the  training  set  have  ground-truth  labels. 
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